## Warning: Removed 1 rows containing missing values (position_stack).

Introduction

TODO

Datasets

Downloaded

The first dataset considered is the Steam Video Games Dataset. This dataset is a list of user behaviors, with columns: user-id, game-title, behavior-name, value. The behaviors included are ‘purchase’ and ‘play’. The value indicates the degree to which the behavior was performed. In the case of ‘purchase’ the value is always 1, and in the case of ‘play’ the value represents the number of hours the user has played the game.

user.id game.title behavior.name value
151603712 The Elder Scrolls V Skyrim purchase 1.0
151603712 The Elder Scrolls V Skyrim play 273.0
151603712 Fallout 4 purchase 1.0
151603712 Fallout 4 play 87.0
151603712 Spore purchase 1.0
151603712 Spore play 14.9

The dataset contains 200k entries relative to over 12k different users and over 5k games. The skewness of the data, equal to 10.74, is evident, with a median of 2 and a mean of 10 game purchases per user. Also the playtime has a high variability between gamers: from users that played for less than 5 minutes, to users with thousands and thousands of hours. The user with the highest number of games has 1552 games with a playtime of 6778 hours. Instead the user that spent the highest number of hours playing spent 11906 hours on 433 different games. Sadly the dataset does not contain the period in which the hours were spent.

Steam games complete dataset is the second dataset used. In this one are listed 40k games, each with a set of information about the genre, the developer, associated tags, description, and others. For the purpose of this assignment we are interested only in a subset of the columns, for example the url to the Steam page is not useful for us. Follows a glimpse of the data.

name release_date genre developer
DOOM May 12, 2016 Action id Software
PLAYERUNKNOWN’S BATTLEGROUNDS Dec 21, 2017 Action,Adventure,Massively Multiplayer PUBG Corporation
BATTLETECH Apr 24, 2018 Action,Adventure,Strategy Harebrained Schemes
DayZ Dec 13, 2018 Action,Adventure,Massively Multiplayer Bohemia Interactive
EVE Online May 6, 2003 Action,Free to Play,Massively Multiplayer,RPG,Strategy CCP
Grand Theft Auto V: Premium Online Edition NaN Action,Adventure Rockstar North

Analyzed

Because I was interested in following the connections between gamers and type of games played I’ve created two sub datasets: users_info.csv is a subset of the first one, while games_info.csv is a subset of the second. They were created joining the initial datasets in a way that, for now on, only games with players will be considered and only player that play games for which we actually have details. Summing up we consider 2k games and 10k users, with over 90k user-game interactions (either “purchase” or “play”). Every users plays at least one of the 2k games and every game has a description and at least one player.

About the users

Type of gamers

Is it possible to differentiate groups of player by theirs behaviors? The majority of people buy a lot of games or stick to a few favorites? The plot below hints some answers at these questions.

The plot animation cycles between the scale of the data and a logarithmic scale, thus giving two different perspectives:

  • in the default scale we can see that the data is clustered near the origin, this means that the users’ majority has bought a small number of games and has played them few hours. Analyzing the data summaries, mean of 3.11 (median 1) games, 98.5 (3.20) hours and 52.8 (0) dollars spent. Also, it seems that there is not a linear relationship between number of games and playtime, meaning that the users tend to buy and play for a small period of time the new games. Worth nothing that the money invested does not seem to be a reason to play, that’s interesting specially given the linearity between the money spent and the number of games.

  • the log-scale gives a better understanding of the 3 cluster (generated with k-means) showing that the red cluster captures people with low budget and low free time, the blue cluster comprehend people above the average but not as extreme gamers as the green cluster. The blue and the green group overlap a bit but together are definitely separated from the red group.

What about the highlited gamers

Inspecting more about the highlighted users, that are the four most extreme users, gives us the following plot.

It’s funny to see that the players with the higher number of hours spent all the time with one game. Note also that both the games are free to play.

The users in the first row had respectively spent a mean of 13 and 11 dollars on each game, playing them with and average of 10 and 7 hours per game. The second user (upper right) bought 212 games, and, excluding the five shown, spent an average of 5.1 hours per game. Note that very few games can be completed in such a few hours.

Follows an interactive plot to better explore the users’ preferences.

About the games

Newer games have more playtime than older games? The playtime is evenly distributed? If not what are the factors that determine the higher playtime?

The majority of the games were released between 2013 and 2020 but there’s two game far on the left of the plot:

  • Dragon’s Lair, an interactive film video game published in 1983. In the game the protagonist Dirk the Daring is a knight attempting to rescue Princess Daphne from the evil dragon Singe who has locked the princess in the foul wizard Mordroc’s castle. It featured animation by ex-Disney animator Don Bluth.

  • Space Ace was unveiled in October 1983, just four months after the Dragon’s Lair game, then released in Spring 1984, and like its predecessor featured film-quality animation. The gameplay is similar to Dragon’s Lair, requiring the player to move the joystick or press the fire button at key moments in the animated sequences to govern the hero’s actions.

This two are widely considered videogame’s history gems.

About the playtime we can see that the distribution is almost normal if we do not consider the huge quantity of games played for few minutes. It seems that a game is either played for a short period of time and then dislikes, or played for about 100 hours. The games with the highest play time are Team Fortress 2 and Dota 2, both FreeToPlay Action Multiplayer PvP games. The latter, since the 2013 release, hoarded almost 1 million hours of playtime distributed over 5k players. Team Fortress 2, released in 2007, accumulated more than 150k hours with less than 2500 users.

Noting that both the top games are FreeToPlay raises a question: what kind of relationship binds the game genre with its popularity? Which are the most popular genres, measuring popularity with purchases or playtime?

First of all most games belong to at least two genres, and there are 22 different genres in the dataset.

## [1] "22 generi differenti"
##                    genre    n                 label           x          y
## 1                  Indie 1223                 Indie -12.9125382  36.435853
## 2                 Action  900                Action -16.9256875   0.000000
## 3              Adventure  582             Adventure  13.6108910   0.000000
## 4               Strategy  496              Strategy -21.6911501 -29.103210
## 5                 Casual  415                Casual   1.2476294 -21.848935
## 6                    RPG  401                   RPG  38.4671972  -1.615949
## 7             Simulation  340            Simulation  26.9190508 -19.989164
## 8           Free to Play  164          Free to Play   8.6288586  20.231659
## 9  Massively Multiplayer   79 Massively Multiplayer  20.5122908  17.299728
## 10                Racing   78                Racing  28.4431564  11.212885
## 11                Sports   61                Sports  15.3366873 -29.218097
## 12          Early Access   56          Early Access  -0.4237928  13.225449
## 13             Utilities   17                       -28.1130178 -15.667780
## 14 Design & Illustration    9                         0.1837312   7.342156
## 15  Animation & Modeling    8                         0.1732234  -7.118399
## 16        Web Publishing    7                       -31.7622354 -10.913938
## 17      Video Production    4                       -29.8025167 -12.654509
## 18             Education    3                         4.4157491  11.325278
## 19     Software Training    3                        17.1494690 -14.152421
## 20      Audio Production    1                         2.0453155  -8.195753
## 21      Game Development    1                        -5.2037141  12.980414
## 22         Photo Editing    1                        21.4618837  11.802323
##        radius
## 1  19.7305091
## 2  16.9256875
## 3  13.6108910
## 4  12.5650986
## 5  11.4934156
## 6  11.2978876
## 7  10.4031419
## 8   7.2251520
## 9   5.0146267
## 10  4.9827875
## 11  4.4064615
## 12  4.2220082
## 13  2.3262132
## 14  1.6925688
## 15  1.5957691
## 16  1.4927053
## 17  1.1283792
## 18  0.9772050
## 19  0.9772050
## 20  0.5641896
## 21  0.5641896
## 22  0.5641896

In the plot above the area of the circles is proportional to the number of games that belong to the genre. The most frequent is Indie with 1223 titles followed by Action (900), Adventure (582) and Strategy (496). Note that FreeToPlay occupy the 8-th position. This huge difference in the genre presence on the store may be caused by the diffent business strategies. The Indie games usually are developed by small groups and sometimes by solo-developers, instead big FreeToPlay need a huge initial investment that’s mitigated fidelizing the palyers. The income of an indie game comes from the number of units sold, the income of a FreeToPlay comes from in-game purchases of skins, cosmetics and items. So a FreeToPlay adopts a strategy that’s almost opposite to the one adopted by and inde game.

Also the number of Indie games is extremely increased in the last years because of the accessibility of powerful tools and because of the possibility of reaching a huge audience of customers.

Note the huge difference between the presence of Indie games and FreeToPlay (F2P) over the years. Once a F2P game has been released, the developers stick to the game to add new content, sometimes even monthly. On the other hand for a small Indie company it’s better to release frequently smaller games.

To confirm this different business strategy we should look at the playtime for every genre and note that the Indie genre collected less hours than the F2P genre, the plot below aims at that.

TODO The blue columns Notare come i Free2Play sono molto presenti e sono anche molto giocati! significa che la gente gioca a quelli un po’ a caso tanto per

Ora sarebbe da capire quanti dei F2P sono action e quanti degli indie sono action, o comunque che tipo di correlazioni ci sono tra i generi.

" questo e’ circa text mining sulle stringhe che definiscono i generi" TODO rimuovere discorso dell’ordine

Assumendo che l’ordine dei tag dei generi indichi la maggior appartenenza (“Indie, Action” indica che e’ piu’ indie che action) Notiamo che indie e’ correlato in modo molto forte ad action, quindi questo spiega il fatto che siano entrambi molto diffusi. Mente f2p e’ sganciato da indie, giochi che di solito costano poco perche’ costa poco farli e quindi non vengono messi a gratis Proprio marketing differenti TODO aggiustare commento in realzione all’ordinamento dei generi

Qui ogni arco rappresenta la frequenza con cui la coppia compare nel genere di un gioco diviso la somma delle frequenze dei generi. Se rilassiamo l’assunzione riguardo all’ordine dei generi possiamo notare che si formano due cluster Il primo e’ foramto da software di utility mentre il secondo da i veri e propri giochi. Zoomando sul cluster relativo ai giochi

TODO commento vediamo che il legame tra ftp e mmo e’ molto forte zoomando ancora

TODO Commentare grafico

About user-games interactions

TODO add comment add numerical summaries for centrality

TODO accennare ai recsys!